我们介绍了IST和Unmabel对WMT 2022关于质量估计(QE)的共享任务的共同贡献。我们的团队参与了所有三个子任务:(i)句子和单词级质量预测;(ii)可解释的量化宽松;(iii)关键错误检测。对于所有任务,我们在彗星框架之上构建,将其与OpenKIWI的预测估计架构连接,并为其配备单词级序列标记器和解释提取器。我们的结果表明,在预处理过程中合并参考可以改善下游任务上多种语言对的性能,并且通过句子和单词级别的目标共同培训可以进一步提高。此外,将注意力和梯度信息结合在一起被证明是提取句子级量化量化宽松模型的良好解释的首要策略。总体而言,我们的意见书在几乎所有语言对的所有三个任务中都取得了最佳的结果。
translated by 谷歌翻译
Position modeling plays a critical role in Transformers. In this paper, we focus on length extrapolation, i.e., training on short texts while evaluating longer sequences. We define attention resolution as an indicator of extrapolation. Then we propose two designs to improve the above metric of Transformers. Specifically, we introduce a relative position embedding to explicitly maximize attention resolution. Moreover, we use blockwise causal attention during inference for better resolution. We evaluate different Transformer variants with language modeling. Experimental results show that our model achieves strong performance in both interpolation and extrapolation settings. The code will be available at https://aka.ms/LeX-Transformer.
translated by 谷歌翻译
Despite their widespread adoption, neural conversation models have yet to exhibit natural chat capabilities with humans. In this research, we examine user utterances as causes and generated responses as effects, recognizing that changes in a cause should produce a different effect. To further explore this concept, we have compiled and expanded upon a new dataset called CausalDialogue through crowd-sourcing. This dataset includes multiple cause-effect pairs within a directed acyclic graph (DAG) structure. Our analysis reveals that traditional loss functions can struggle to effectively incorporate the DAG structure, leading us to propose a causality-enhanced method called Exponential Maximum Average Treatment Effect (ExMATE) to enhance the impact of causality at the utterance level in training neural conversation models. To evaluate the effectiveness of this approach, we have built a comprehensive benchmark using the CausalDialogue dataset leveraging large-scale pre-trained language models, and have assessed the results through both human and automatic evaluation metrics for coherence, diversity, and agility. Our findings show that current techniques are still unable to effectively address conversational DAGs, and that the ExMATE method can improve the diversity and agility of conventional loss functions while maintaining coherence.
translated by 谷歌翻译
Business processes that involve AI-powered automation have been gaining importance and market share in recent years. These business processes combine the characteristics of classical business process management, goal-driven chatbots, conversational recommendation systems, and robotic process automation. In the new context, prescriptive process monitoring demands innovative approaches. Unfortunately, data logs from these new processes are still not available in the public domain. We describe the main challenges in this new domain and introduce a synthesized dataset that is based on an actual use case of intelligent process automation with chatbot orchestration. Using this dataset, we demonstrate crowd-wisdom and goal-driven approaches to prescriptive process monitoring.
translated by 谷歌翻译
Out-of-distribution (OOD) detection has attracted a large amount of attention from the machine learning research community in recent years due to its importance in deployed systems. Most of the previous studies focused on the detection of OOD samples in the multi-class classification task. However, OOD detection in the multi-label classification task remains an underexplored domain. In this research, we propose YolOOD - a method that utilizes concepts from the object detection domain to perform OOD detection in the multi-label classification task. Object detection models have an inherent ability to distinguish between objects of interest (in-distribution) and irrelevant objects (e.g., OOD objects) on images that contain multiple objects from different categories. These abilities allow us to convert a regular object detection model into an image classifier with inherent OOD detection capabilities with just minor changes. We compare our approach to state-of-the-art OOD detection methods and demonstrate YolOOD's ability to outperform these methods on a comprehensive suite of in-distribution and OOD benchmark datasets.
translated by 谷歌翻译
We present the UC$^3$RL algorithm for regret minimization in Stochastic Contextual MDPs (CMDPs). The algorithm operates under the minimal assumptions of realizable function class, and access to offline least squares and log loss regression oracles. Our algorithm is efficient (assuming efficient offline regression oracles) and enjoys an $\widetilde{O}(H^3 \sqrt{T |S| |A|(\log (|\mathcal{F}|/\delta) + \log (|\mathcal{P}|/ \delta) )})$ regret guarantee, with $T$ being the number of episodes, $S$ the state space, $A$ the action space, $H$ the horizon, and $\mathcal{P}$ and $\mathcal{F}$ are finite function classes, used to approximate the context-dependent dynamics and rewards, respectively. To the best of our knowledge, our algorithm is the first efficient and rate-optimal regret minimization algorithm for CMDPs, which operates under the general offline function approximation setting.
translated by 谷歌翻译
We address the general task of structured commonsense reasoning: given a natural language input, the goal is to generate a graph such as an event -- or a reasoning-graph. To employ large language models (LMs) for this task, existing approaches ``serialize'' the output graph as a flat list of nodes and edges. Although feasible, these serialized graphs strongly deviate from the natural language corpora that LMs were pre-trained on, hindering LMs from generating them correctly. In this paper, we show that when we instead frame structured commonsense reasoning tasks as code generation tasks, pre-trained LMs of code are better structured commonsense reasoners than LMs of natural language, even when the downstream task does not involve source code at all. We demonstrate our approach across three diverse structured commonsense reasoning tasks. In all these natural language tasks, we show that using our approach, a code generation LM (CODEX) outperforms natural-LMs that are fine-tuned on the target task (e.g., T5) and other strong LMs such as GPT-3 in the few-shot setting.
translated by 谷歌翻译
近年来,出于计算机视觉目的,将图像传输到远程服务器的传输急剧增加。在许多应用程序(例如监视)中,图像主要是用于自动分析的,并且很少被人类看到。在这种情况下,使用传统的压缩在比特率方面效率低下,这可能是由于关注基于人类的失真指标。因此,重要的是创建特定的图像编码方法,以供人类和机器联合使用。创建这种编解码器的机器侧的一种方法是在深神经网络中执行某些中间层执行机器任务的功能匹配。在这项工作中,我们探讨了用于培训人类和机器可学习的编解码器时所使用的层选择的效果。我们证明,使用数据处理不平等,从速率延伸的意义上讲,更深层的匹配特征是可取的。接下来,我们通过重新培训现有的可扩展人机编码模型来从经验上确认我们的发现。在我们的实验中,我们显示了这种可扩展模型的人类和机器方面的权衡,并讨论了在这方面使用更深层进行训练的好处。
translated by 谷歌翻译
许多具有挑战性的现实世界问题需要部署合奏多个互补学习模型,以达到可接受的绩效水平。虽然有效,但将整个合奏应用于每个样本都是昂贵且通常不必要的。深钢筋学习(DRL)提供了一种具有成本效益的替代方案,其中检测器是根据其前辈的输出动态选择的,其实用性加权其计算成本。尽管它们具有潜力,但基于DRL的解决方案并未在这种能力中广泛使用,部分原因是在为每个新任务配置奖励功能,DRL代理对数据变化的不可预测反应以及无法使用常见的反应的困难。性能指标(例如TPR/FPR)指导该算法的性能。在这项研究中,我们提出了用于微调和校准基于DRL的策略的方法,以便它们可以满足多个绩效目标。此外,我们提出了一种将有效的安全策略从一个数据集传输到另一个数据集的方法。最后,我们证明我们的方法对对抗性攻击非常强大。
translated by 谷歌翻译
正如最近的作品中观察到的那样,通信图神经网络(GNN)中信号传播的质量强烈影响其表现力。特别是,对于依靠远程相互作用的预测任务,节点特征的递归聚合可能导致不希望的现象称为“过句”。我们提出了一个基于信息收缩的分析过度句子的框架。我们的分析以可靠计算的模型为指导,该模型由于冯·诺伊曼(Von Neumann),该模型在嘈杂的计算图中提供了新的洞察力作为信号淬灭的新见解。在此基础上,我们提出了一个旨在减轻过度量化的算法的图形。我们的算法采用了由扩展器图构造动机的随机局部边缘翻转原始的。我们将算法的光谱膨胀特性与现有基于曲率的非本地重新布线策略的光谱膨胀属性进行了比较。合成实验表明,尽管我们的算法通常具有较慢的膨胀速率,但总体计算更便宜,可以准确地保留节点度,并且永远不会断开图表的连接。
translated by 谷歌翻译